Search CORE

2,244 research outputs found

Bulkloading and Maintaining XML Documents

Author: Kersten M.L. (Martin)
Schmidt A.R.
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2002
Field of study

The popularity of XML as a exchange and storage format brings about massive amounts of documents to be stored, maintained and analyzed -- a challenge that traditionally has been tackled with Database Management Systems (DBMS). To open up the content of XML documents to analysis with declarative query languages, efficient bulk loading techniques are necessary. Database technology has traditionally been offering support for these tasks but yet falls short of providing efficient automation techniques for the challenges that large collections of XML data raise. As storage back-end, many applications rely on relational databases, which are designed towards large data volumes. This paper studies the bulk load and update algorithms for XML data stored in relational format and outlines opportunities and problems. We investigate both (1) bulk insertion and deletion as well as (2) updates in the form of edit scripts which heavily use pointer-chasing techniques which often are considered orthogonal to the algebraic operations relational databases are optimized for. To get the most out of relational database systems, we show that one should make careful use of edit scripts and replace them with bulk operations if more than a very small portion of the database is updated. We implemented our ideas on top of the Monet Database System and benchmarked their performance

CWI's Institutional Repository

Acoi: A System for Indexing Multimedia Objects

Author: Kersten M.L. (Martin)
Schmidt A.R.
Windhouwer M.A. (Menzo)
Publication venue
Publication date: 01/01/2000
Field of study

The explosion of the number of Web pages also leads to countless accessible multimedia objects. Their abundance makes the Internet an interesting application for multimedia retrieval systems. Many search engines are going about to supply some retrieval functionality for independent retrieval of these objects. However, most of these multimedia search engines aim at a fixed set of multimedia index attributes. The Acoi system provides an extensible framework for retrieving multimedia objects of any type on basis of their content, based on both low-level features and high-level concepts, and context

CWI's Institutional Repository

Storing XML Documents in Databases

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Schmidt A.R.
Publication venue: Idea Group Publishing
Publication date: 01/01/2005
Field of study

The authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes and integrity constraints. This chapter presents now bulkloading is done in Monet XML, a main memory XML database system, and evaluates the cost of bulkloading and bulk deletion with respect to strategies which base on insertion and deletion of individual nodes. Additionally, we survey the applicability of the techniques to a wider class of XML storage schemas

Crossref

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Indexing real-world data using semi-structured documents

Author: Kersten M.L. (Martin)
Schmidt A.R.
Windhouwer M.A. (Menzo)
Publication venue: CWI
Publication date: 01/01/1999
Field of study

We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured docu-ment model. We show how our framework, called {em feature grammars, can be used to (1)~exploit third-party interpretation modules for real-world unstructured components, and (2)~use context-free grammars to convert such poorly or unstructured input to semi-structured output. The basic idea is to enrich context-free grammars with special symbols called detectors, which provide for the necessary structure {em just-in-time to satisfy a parser look-ahead. A prototype implementation has been constructed in the Acoi project to demonstrate the feasibility of this approach for indexing both images and audio documents

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Querying XML Documents Made Easy: Nearest Concept Queries

Author: Kersten M.L. (Martin)
Schmidt A.R.
Windhouwer M.A. (Menzo)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2001
Field of study

Due to the ubiquity and popularity of XML, users often are in the following situation: they want to query XML documents which contain potentially interesting information but they are unaware of the mark-up structure that is used. For example, it is easy to guess the contents of an XML bibliography file whereas the mark-up depends on the methodological, cultural and personal background of the author(s). Nonetheless, it is this hierarchical structure that forms the basis of XML query languages. In this paper we exploit the tree structure of XML documents to equip users with a powerful tool, the meet operator, that lets them query databases with whose content they are familiar, but without requiring knowledge of tags and hierarchies. Our approach is based on computing the lowest common ancestor of nodes in the XML syntax tree: eg, given two strings, we are looking for nodes whose offspring contains these two strings. The novelty of this approach is that the result type is unknown at query formulation time and dependent on the database instance. If the two strings are an author's name and a year, mainly publications of the author in this year are returned. If the two strings are numbers the result mostly consists of publications that have the numbers as year or page numbers. Because the result type of a query is not specified by the user we refer to the lowest common ancestor as nearest concept We also present a running example taken from the bibliography domain, and demonstrate that the operator can be implemented efficiently

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Transient effects in fission evidenced from new experimental signatures

Author: Benlliure J.
Enqvist T.
Gesellschaft fuer Schwerionenforschung mbH Darmstadt (Germany)
Junghans A.R.
Jurado B.
Kelic A.
Rejmund F.
Schmidt K.H.
Schmitt C.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2004
Field of study

A new experimental approach is introduced to investigate the relaxation of the nuclear deformation degrees of freedom. Highly excited fissioning systems with compact shapes and low angular momenta are produced in peripheral relativistic heavy-ion collisions. Both fission fragments are identified in atomic number. Fission cross sections and fission-fragment element distributions are determined as a function of the fissioning element. From the comparison of these new observables with a nuclear-reaction code a value for the transient time is deduced.Comment: 6 pages, 2 figures, background information at http://www-w2k.gsi.de/kschmidt

arXiv.org e-Print Archive

A Look Back on the XML Benchmark Project

Author: Kersten M.L. (Martin)
Manegold S. (Stefan)
Schmidt A.R.
Waas F.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2003
Field of study

The XML Benchmark Project was started to provide a framework for evaluating the interplay of XML technologies and Database Management Systems. The benchmark lays emphasis on engineering aspects as well as on performance of the query processor. In this chapter the authors present a quick overview of the benchmark and point at some of the experience they gathered during the design of the benchmark and while running it on a variety of platforms. Since the benchmark was designed early in the evolution of XML, our experiences also reflect how the perception of XML changed during the three years that have passed since we started working on the subject. The chapter comprises an overview of the benchmark as well as discussions of some lessons learned

CWI's Institutional Repository

The XML benchmark project

Author: Carey M.J.
Florescu D.
Kersten M.L. (Martin)
Schmidt A.R.
Waas F.
Publication venue: CWI
Publication date: 01/01/2001
Field of study

With standardization efforts of a query language for XML documents drawing to a close, researchers and users increasingly focus their attention on the database technology that has to deliver on the new challenges that the sheer amount of XML documents produced by applications pose to data management: validation, performance evaluation and optimization of XML query processors are the upcoming issues. Following a long tradition in database research, the XML Store Benchmark Project provides a framework to assess an XML database's abilities to cope with a broad spectrum of different queries, typically posed in real-world application scenarios. The benchmark is intended to help both implementors and users to compare XML databases independent of their own, specific application scenario. To this end, the benchmark offers a set queries each of which is intended to challenge a particular primitive of the query processor or storage engine. The overall workload we propose consists of a scalable document database and a concise, yet comprehensive set of queries, which covers the major aspects of query processing. The queries' challenges range from stressing the textual character of the document to data analysis queries, but include also typical ad-hoc queries. We complement our research with results obtained from running the benchmark on our XML database platform. They are intended to give a first baseline, illustrating the state of the art

CWI's Institutional Repository

Flexible and scalable digital library search

Author: Blok H.E.
Petkovic M.
Schmidt A.R.
Windhouwer M.A. (Menzo)
Zwol R. van
Publication venue: CWI
Publication date: 01/01/2001
Field of study

In this report the development of a specialised search engine for a digital library is described. The proposed system architecture consists of three levels: the conceptual, the logical and the physical level. The conceptual level schema enables by its exposure of a domain specific schema semantically rich conceptual search. The logical level provides a description language to achieve a high degree of flexibility for multimedia retrieval. The physical level takes care of scalable and efficient persistent data storage. The role, played by each level, changes during the various stages of a search engine's lifecycle: (1) modeling the index, (2) populating and maintaining the index and (3) querying the index. The integration of all this functionality allows the combination of both conceptual and content-based querying in the query stage. A search engine for the Australian Open tennis tournament website is used as a running example, which shows the power of the complete architecture and its various component

CWI's Institutional Repository

Non-equilibrium Wall Deposition of Inertial Particles in Turbulent Flow

Author: A. Guha
A.R. Kerstein
A.R. Kerstein
A.R. Kerstein
A.R. Kerstein
A.R. Kerstein
Alan R. Kerstein
B.Y. Liu
D.B. DeGraaff
D.D. McCoy
D.I. Graham
D.I. Graham
G. Gioia
J. Kalda
J. Young
J.B. McLaughlin
J.R. Schmidt
John R. Schmidt
Jost O. L. Wendt
M. Chen
M. Shin
M. Shin
P.N. Rowe
Q. Wang
R.C. Schmidt
S. Hoyas
W.S.J. Uijttewaal
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref